{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8.1 Parsing Unix timestamps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's not obvious how to deal with Unix timestamps in pandas -- it took me quite a while to figure this out. The file we're using here is a popularity-contest file I found on my system at `/var/log/popularity-contest`.\n",
    "\n",
    "Here's an [explanation of how this file works](http://popcon.ubuntu.com/README).\n",
    "\n",
    "I'm going to hope that nothing in it is sensitive :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read it, and remove the last row\n",
    "popcon = pd.read_csv('../data/popularity-contest', sep=' ', )[:-1]\n",
    "popcon.columns = ['atime', 'ctime', 'package-name', 'mru-program', 'tag']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The colums are the access time, created time, package name, recently used program, and a tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atime</th>\n",
       "      <th>ctime</th>\n",
       "      <th>package-name</th>\n",
       "      <th>mru-program</th>\n",
       "      <th>tag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1387295797</td>\n",
       "      <td>1367633260</td>\n",
       "      <td>perl-base</td>\n",
       "      <td>/usr/bin/perl</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1387295796</td>\n",
       "      <td>1354370480</td>\n",
       "      <td>login</td>\n",
       "      <td>/bin/su</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1387295743</td>\n",
       "      <td>1354341275</td>\n",
       "      <td>libtalloc2</td>\n",
       "      <td>/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1387295743</td>\n",
       "      <td>1387224204</td>\n",
       "      <td>libwbclient0</td>\n",
       "      <td>/usr/lib/x86_64-linux-gnu/libwbclient.so.0</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1387295742</td>\n",
       "      <td>1354341253</td>\n",
       "      <td>libselinux1</td>\n",
       "      <td>/lib/x86_64-linux-gnu/libselinux.so.1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        atime       ctime  package-name  \\\n",
       "0  1387295797  1367633260     perl-base   \n",
       "1  1387295796  1354370480         login   \n",
       "2  1387295743  1354341275    libtalloc2   \n",
       "3  1387295743  1387224204  libwbclient0   \n",
       "4  1387295742  1354341253   libselinux1   \n",
       "\n",
       "                                    mru-program             tag  \n",
       "0                                 /usr/bin/perl             NaN  \n",
       "1                                       /bin/su             NaN  \n",
       "2  /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7             NaN  \n",
       "3    /usr/lib/x86_64-linux-gnu/libwbclient.so.0  <RECENT-CTIME>  \n",
       "4         /lib/x86_64-linux-gnu/libselinux.so.1             NaN  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "popcon[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The magical part about parsing timestamps in pandas is that numpy datetimes are already stored as Unix timestamps. So all we need to do is tell pandas that these integers are actually datetimes -- it doesn't need to do any conversion at all.\n",
    "\n",
    "We need to convert these to ints to start:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "popcon['atime'] = popcon['atime'].astype(int)\n",
    "popcon['ctime'] = popcon['ctime'].astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Every numpy array and pandas series has a dtype -- this is usually `int64`, `float64`, or `object`. Some of the time types available are `datetime64[s]`, `datetime64[ms]`, and `datetime64[us]`. There are also `timedelta` types, similarly.\n",
    "\n",
    "We can use the `pd.to_datetime` function to convert our integer timestamps into datetimes. This is a constant-time operation -- we're not actually changing any of the data, just how pandas thinks about it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "popcon['atime'] = pd.to_datetime(popcon['atime'], unit='s')\n",
    "popcon['ctime'] = pd.to_datetime(popcon['ctime'], unit='s')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we look at the dtype now, it's `<M8[ns]`. As far as I can tell `M8` is secret code for `datetime64`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('<M8[ns]')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "popcon['atime'].dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So now we can look at our `atime` and `ctime` as dates!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atime</th>\n",
       "      <th>ctime</th>\n",
       "      <th>package-name</th>\n",
       "      <th>mru-program</th>\n",
       "      <th>tag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013-12-17 15:56:37</td>\n",
       "      <td>2013-05-04 02:07:40</td>\n",
       "      <td>perl-base</td>\n",
       "      <td>/usr/bin/perl</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013-12-17 15:56:36</td>\n",
       "      <td>2012-12-01 14:01:20</td>\n",
       "      <td>login</td>\n",
       "      <td>/bin/su</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013-12-17 15:55:43</td>\n",
       "      <td>2012-12-01 05:54:35</td>\n",
       "      <td>libtalloc2</td>\n",
       "      <td>/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2013-12-17 15:55:43</td>\n",
       "      <td>2013-12-16 20:03:24</td>\n",
       "      <td>libwbclient0</td>\n",
       "      <td>/usr/lib/x86_64-linux-gnu/libwbclient.so.0</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013-12-17 15:55:42</td>\n",
       "      <td>2012-12-01 05:54:13</td>\n",
       "      <td>libselinux1</td>\n",
       "      <td>/lib/x86_64-linux-gnu/libselinux.so.1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                atime               ctime  package-name  \\\n",
       "0 2013-12-17 15:56:37 2013-05-04 02:07:40     perl-base   \n",
       "1 2013-12-17 15:56:36 2012-12-01 14:01:20         login   \n",
       "2 2013-12-17 15:55:43 2012-12-01 05:54:35    libtalloc2   \n",
       "3 2013-12-17 15:55:43 2013-12-16 20:03:24  libwbclient0   \n",
       "4 2013-12-17 15:55:42 2012-12-01 05:54:13   libselinux1   \n",
       "\n",
       "                                    mru-program             tag  \n",
       "0                                 /usr/bin/perl             NaN  \n",
       "1                                       /bin/su             NaN  \n",
       "2  /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7             NaN  \n",
       "3    /usr/lib/x86_64-linux-gnu/libwbclient.so.0  <RECENT-CTIME>  \n",
       "4         /lib/x86_64-linux-gnu/libselinux.so.1             NaN  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "popcon[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now suppose we want to look at all packages that aren't libraries.\n",
    "\n",
    "First, I want to get rid of everything with timestamp 0. Notice how we can just use a string in this comparison, even though it's actually a timestamp on the inside? That is because pandas is amazing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "popcon = popcon[popcon['atime'] > '1970-01-01']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use pandas' magical string abilities to just look at rows where the package name doesn't contain 'lib'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "nonlibraries = popcon[~popcon['package-name'].str.contains('lib')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atime</th>\n",
       "      <th>ctime</th>\n",
       "      <th>package-name</th>\n",
       "      <th>mru-program</th>\n",
       "      <th>tag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>2013-12-17 04:55:39</td>\n",
       "      <td>2013-12-17 04:55:42</td>\n",
       "      <td>ddd</td>\n",
       "      <td>/usr/bin/ddd</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>450</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:05:13</td>\n",
       "      <td>nodejs</td>\n",
       "      <td>/usr/bin/npm</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>454</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:05:04</td>\n",
       "      <td>switchboard-plug-keyboard</td>\n",
       "      <td>/usr/lib/plugs/pantheon/keyboard/options.txt</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:05:04</td>\n",
       "      <td>thunderbird-locale-en</td>\n",
       "      <td>/usr/lib/thunderbird-addons/extensions/langpac...</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>396</th>\n",
       "      <td>2013-12-16 20:08:27</td>\n",
       "      <td>2013-12-16 20:05:03</td>\n",
       "      <td>software-center</td>\n",
       "      <td>/usr/sbin/update-software-center</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>449</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:05:00</td>\n",
       "      <td>samba-common-bin</td>\n",
       "      <td>/usr/bin/net.samba3</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>397</th>\n",
       "      <td>2013-12-16 20:08:25</td>\n",
       "      <td>2013-12-16 20:04:59</td>\n",
       "      <td>postgresql-client-9.1</td>\n",
       "      <td>/usr/lib/postgresql/9.1/bin/psql</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398</th>\n",
       "      <td>2013-12-16 20:08:23</td>\n",
       "      <td>2013-12-16 20:04:58</td>\n",
       "      <td>postgresql-9.1</td>\n",
       "      <td>/usr/lib/postgresql/9.1/bin/postmaster</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>452</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:04:55</td>\n",
       "      <td>php5-dev</td>\n",
       "      <td>/usr/include/php5/main/snprintf.h</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>440</th>\n",
       "      <td>2013-12-16 20:03:20</td>\n",
       "      <td>2013-12-16 20:04:54</td>\n",
       "      <td>php-pear</td>\n",
       "      <td>/usr/share/php/XML/Util.php</td>\n",
       "      <td>&lt;RECENT-CTIME&gt;</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  atime               ctime               package-name  \\\n",
       "57  2013-12-17 04:55:39 2013-12-17 04:55:42                        ddd   \n",
       "450 2013-12-16 20:03:20 2013-12-16 20:05:13                     nodejs   \n",
       "454 2013-12-16 20:03:20 2013-12-16 20:05:04  switchboard-plug-keyboard   \n",
       "445 2013-12-16 20:03:20 2013-12-16 20:05:04      thunderbird-locale-en   \n",
       "396 2013-12-16 20:08:27 2013-12-16 20:05:03            software-center   \n",
       "449 2013-12-16 20:03:20 2013-12-16 20:05:00           samba-common-bin   \n",
       "397 2013-12-16 20:08:25 2013-12-16 20:04:59      postgresql-client-9.1   \n",
       "398 2013-12-16 20:08:23 2013-12-16 20:04:58             postgresql-9.1   \n",
       "452 2013-12-16 20:03:20 2013-12-16 20:04:55                   php5-dev   \n",
       "440 2013-12-16 20:03:20 2013-12-16 20:04:54                   php-pear   \n",
       "\n",
       "                                           mru-program             tag  \n",
       "57                                        /usr/bin/ddd  <RECENT-CTIME>  \n",
       "450                                       /usr/bin/npm  <RECENT-CTIME>  \n",
       "454       /usr/lib/plugs/pantheon/keyboard/options.txt  <RECENT-CTIME>  \n",
       "445  /usr/lib/thunderbird-addons/extensions/langpac...  <RECENT-CTIME>  \n",
       "396                   /usr/sbin/update-software-center  <RECENT-CTIME>  \n",
       "449                                /usr/bin/net.samba3  <RECENT-CTIME>  \n",
       "397                   /usr/lib/postgresql/9.1/bin/psql  <RECENT-CTIME>  \n",
       "398             /usr/lib/postgresql/9.1/bin/postmaster  <RECENT-CTIME>  \n",
       "452                  /usr/include/php5/main/snprintf.h  <RECENT-CTIME>  \n",
       "440                        /usr/share/php/XML/Util.php  <RECENT-CTIME>  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nonlibraries.sort_values('ctime', ascending=False)[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, cool, it says that I I installed ddd recently. And postgresql! I remember installing those things. Neat."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The whole message here is that if you have a timestamp in seconds or milliseconds or nanoseconds, then you can just \"cast\" it to a `'datetime64[the-right-thing]'` and pandas/numpy will take care of the rest."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "    @font-face {\n",
    "        font-family: \"Computer Modern\";\n",
    "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
    "    }\n",
    "    div.cell{\n",
    "        width:800px;\n",
    "        margin-left:16% !important;\n",
    "        margin-right:auto;\n",
    "    }\n",
    "    h1 {\n",
    "        font-family: Helvetica, serif;\n",
    "    }\n",
    "    h4{\n",
    "        margin-top:12px;\n",
    "        margin-bottom: 3px;\n",
    "       }\n",
    "    div.text_cell_render{\n",
    "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
    "        line-height: 145%;\n",
    "        font-size: 130%;\n",
    "        width:800px;\n",
    "        margin-left:auto;\n",
    "        margin-right:auto;\n",
    "    }\n",
    "    .CodeMirror{\n",
    "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
    "    }\n",
    "    .text_cell_render h5 {\n",
    "        font-weight: 300;\n",
    "        font-size: 22pt;\n",
    "        color: #4057A1;\n",
    "        font-style: italic;\n",
    "        margin-bottom: .5em;\n",
    "        margin-top: 0.5em;\n",
    "        display: block;\n",
    "    }\n",
    "    \n",
    "    .warning{\n",
    "        color: rgb( 240, 20, 20 )\n",
    "        }  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
